Attribute tagging and matching system and method for database management

ABSTRACT

An attribute language and attribute tagging and matching system is used to permit fine-grained data searches based on overall similarity using attribute values and weightings. The searches use similarity to specific attributes, or similarity to combinations of attributes. The system includes an attribute language and a structure for inputting attribute values. It includes “relevance” weightings that are used to refine calculations of the similarity between specific products. These weightings substantially increase the precision of search results and the usefulness of the order in which search results are presented. The invention further includes a system for tuning search results. This system includes the application of “must match” checkboxes to particular attributes, coupled with “importance” weightings applied to any or all attributes, including those that are not checked as “must match.”

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority of U.S. provisional applicationsSer. No. 60/183,709 entitled, “Attribute Tagging and Matching System andMethod for Database Management” filed Feb. 18, 2000 by the presentapplicant.

FIELD OF THE INVENTION

[0002] This invention relates generally to computer databases and moreparticularly to searching and retrieving data from a database usingattribute tags and weights.

BACKGROUND OF THE INVENTION

[0003] With the improvements in computer capabilities, there has been anexponential increase in data stored in databases. Once stored, dataneeds to be accessible, preferably quickly and to a fine degree ofgranularity. The vast amounts of data stored in databases drives a needfor easy-to-navigate databases and for efficient and specific dataretrieval. The explosive growth of commerce, both consumer andbusiness-to-business, on the Internet has linked many commercialdatabases of diverse structure and content.

[0004] Effective searching is a particular problem if the data items areimage-based or based on some other type of data object rather thancharacter-based (that is, words or numbers). Alphanumericcharacter-based data allows string searches. Images, or other dataobjects, by themselves are generally not searchable by data stringsearch or the like. Such data objects as pictorial works orrepresentations of real-world physical objects are indexed, if at all,in diverse ways that do not lend themselves to searches over largeaggregates of such data objects or distributed databases.

[0005] It remains desirable to have a system and method for searchingdatabases effectively, particularly a database storingnoncharacter-based data objects.

[0006] It is an object of the present invention to provide a method andapparatus using an indexing system of attributes assigned subjectiveweights to search an image-based database easily and efficiently.

[0007] It is another object of the present invention to provide a methodand apparatus to enable consumers, retailers, integrators, designers andothers to effectively navigate a product database over the Web.

SUMMARY OF THE INVENTION

[0008] The problems of retrieving data objects from a database aresolved by the present invention of an attribute tagging and matchingsystem and method for database management.

[0009] The present invention has an attribute language and attributetagging and matching system in which attribute values and weightings areused to permit fine-grained data searches based on overall similarity.Similarity is measured by closeness of matching of specific attributevalues or of combinations of attributes. The system includes anattribute language syntax that provides structure for assigningattribute values. It includes “relevance” weightings for values ofcertain attributes that have multiple values for a particular product;these are used to refine calculations of the similarity between specificproducts. These relevance weightings substantially increase theprecision of search results and the usefulness of the order in whichsearch results are presented.

[0010] The invention further includes a system for tuning searchresults. This system includes the application of “must match” checkboxesto particular attributes, coupled with “importance” (relative to otherattributes, different from “relevance” as to a given attribute)weightings applied to any or all attributes, including those that arenot checked as “must match.” Search results may be tuned across anentire database category as a default setting, or at the level of asingle database entry.

[0011] The present invention also enables a user to organize and storedata items in a personal database and to make that personal databaseaccessible for data retrieval by others.

[0012] The attribute system described herein is applied to a databaserelated to home design. The system, however, is applicable to andvaluable for any complex database of images and/or products.

[0013] The present invention together with the above and otheradvantages may best be understood from the following detaileddescription of the embodiments of the invention illustrated in thedrawings, wherein:

BRIEF DESCRIPTION OF THE DRAWINGS

[0014]FIG. 1 is a block diagram of the attribute tagging and matchingsystem according to principles of the invention;

[0015]FIG. 2 is a block diagram of a record of an object in theimage/product database of FIG. 1;

[0016]FIG. 3 is an example of a record of an object using the structureshown in FIG. 2;

[0017]FIG. 4 is a screen shot of an introductory screen showing a listof categories according to principles of the invention;

[0018]FIG. 5 is a screen shot of the category of bathroom fixtures andfittings selected from the list of FIG. 4;

[0019]FIG. 6 is a screen shot of the tubs category selected from thelist of FIG. 5;

[0020]FIG. 7 is a list of the tubs category resulting from a selectionmade from the options shown in FIG. 6;

[0021]FIG. 8 is a screen shot of a tub item selected from the list ofFIG. 7;

[0022]FIG. 9 is a list of tubs resulting from a “Find Similar” search onthe tub of FIG. 8;

[0023]FIG. 10 is a block diagram of a data structure for a “FindSimilar” search using the object of FIG. 3;

[0024]FIG. 11 is a pair of tables of attributes and values, one table ofa source object and one table of a target object in the database;

[0025]FIG. 12 is a flow chart of the “Find Similar” process according toprinciples of the invention; and

[0026]FIG. 13 is a diagram demonstrating the use of the “My Portfolio”database according to principles of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0027]FIG. 1 is a block diagram of the attribute tagging and matchingsystem according to principles of the invention. The system 10 has adatabase 15, a search system 20 and a user portfolios database 25available to users 30 over the Internet 35. In the present embodiment ofthe invention, the database stores product information using a schemathat will be described below with regard to FIG. 2. In alternativeembodiments of the invention, other data objects may be used such asgraphics objects or music objects. The items in the database haveassociated attributes and tags that the search system uses to navigatethe database. Users access the system over the Internet, search thedatabase, create search results, refine search results, and createindividual user portfolios in the user portfolios database. In the userportfolios, the user may attach individual user ratings, such as “LoveIt,” “Like It,” or “Not My Style” to database items in order toindividualize searches of the database.

[0028]FIG. 2 is a block diagram of a record for an object in thedatabase. Each item in the database belongs to one or more categories55. Each item also has attributes 60 and associated values 65. There arefour attribute types: general and unweighted 70; category-specific andunweighted 75; general and relevance weighted 80; and category-specificand relevance weighted 85. A general attribute applies to a plurality ofcategories, although not to all categories in the database. Acategory-specific attribute applies only to a single category. Theweighted attributes 80, 85 also have relevance values 90 which will bedescribed below.

[0029] Unweighted Attributes

[0030] Unweighted attributes 70, 75 are those descriptors that have atrue/false quality, that is, either the descriptor applies to the itemor it does not. Unweighted attributes are, for example, basicinformation attributes such as manufacturer, product line, product name,model number, list price and materials. For example, attributes andselected attribute values for a product that is a “dresser” may be asfollows: Material = wood walnut Finish = shellac satin natural Number ofdrawers = six Height = 72″ Width = 36″ Depth = 18″ Leg style =ball-in-claw Top edge = bullnose

[0031] These attributes are typically selected using check boxes on acomplete attribute list for that category, using pop-up lists, or usingscrolling lists. Attributes that are not easily generalized may be inputas typed-in text (as for precise dimensions).

[0032] Attributes may have multiple values (such as, the material isboth wood and walnut, and the finish is simultaneously shellac, satin,and natural).

[0033] Certain unweighted attributes are not generic, but rather arecategory-specific, product-specific, manufacturer- specific or belong tosome other type of grouping. These “category-specific” attributes aredisplayed only when the user of the system is browsing in the specificarea where the attribute applies.

[0034] Relevance Weighted Attributes

[0035] Relevance weighted attributes 80, 85 are those attributes wherethe descriptor applies to the product to some degree. The valueindicating how much or how little the attribute applies is the“relevance” weight assigned to that attribute. Relevance weightedattributes include, for example, the space in which the productappropriately belongs, such as the kitchen, the bath, the family room,the bedroom, the home office, outdoors, the living room, the diningroom, the sunroom, the exercise room, the library, the porch, the hometheatre, the spa/pool, or the laundry. For each product, each of theseroom attributes would be assigned a numerical weight, that is, a valuethat indicates how well the product fits in that room.

[0036] In the present embodiment of the invention, the relevance weightsare blank at initialization of the products database, and are eitherleft blank or are assigned a five- point weighting on a scale from zeroto four. Typically the weightings are assigned using pop-up lists,although in alternative embodiments of the invention, other methods mayalso be used. Using, for example, the relevance weighted attribute ofthe space (or room) in which the product appropriately belongs: a valueof four means that the product always goes in the attribute room andrarely in another; a value of three means that the product goes well inthe attribute room and in other rooms; a value of two means that thisproduct sometimes goes in the attribute room; a value of one means thatthe product rarely goes in the attribute room; a value of zero meansthat the product never goes into the attribute room; and a blank meansthat the attribute is not weighted.

[0037] Other scales may be used effectively within the scope of thepresent invention. Alternative scales include alphabetical scales like Ato F and numerical scales using different ranges such as 1 to 10 or −5to +5). The specific scale may be modified over time and may be furthertuned using global and category-specific business rules. Variationsinclude applying different numbers to different weightings in a way thatis non-linear, or in a way that converts median weightings into zerovalues and below-median weightings into negative values.

[0038] Other examples of relevance weighted attributes include styleattributes. General style attributes include descriptors such as“traditional,” “country,” “rustic,” “hip,” “modern,” “contemporary,”“Arts & Crafts,” “formal,” “casual,” and “romantic.” Further examples ofgeneral relevance weighted attributes are attributes that may berecorded and used to affect the order or groupings in which products arepresented including image quality, scan quality, overall quality ofmaterials, overall quality of workmanship, overall quality of theproduct's design, how closely the qualities of a particular productcoincide with a particular organization's brand identity, and how much aspecific editor, celebrity, or consumer likes a database entry.

[0039] Certain relevance weighted attributes are category specific. Anexample of this is the style attribute of the home exteriors category.The styles include for example, Shingle-Style, Colonial, Cape,Shingle-Style Colonial, Victorian, Queen Anne Victorian, and Bauhaus. Aswith all other weighted attributes, each of these category-specificstyle attributes receives a relevance weighting, such as, blank, or 0 to4 in which 0 means the attribute is not relevant and 4 means theattribute is highly relevant.

[0040] The relevance weighted attribute architecture for general andcategory-specific attributes enables the system to represent numericallythe reality that products, collections, scenes, and whole houses areoften multiple things, not one thing. For example, a house could besimultaneously Shingle-Style, Colonial, Shingle-Style Colonial, andVictorian.

[0041] The numerical weightings further differentiate these individualobservations. For example, the relevance weightings might be as follows:Shingle-Style (4), Colonial (2), Shingle-Style Colonial (3), Victorian(2), Queen Anne Victorian (1), and Bauhaus (0).

[0042]FIG. 3 is an example data record for a particular house. The housefalls under the category of “whole houses.” The general and unweightedattributes include the architect, the builder; and the photographer. Thecategory-specific and unweighted attributes include the type of siding,the number of bedrooms, the number of bathrooms, and the square-footage.The attributes that general and relevance weighted include quality ofdesign, quality of materials, quality of workmanship, overall style, andcolors. The category- specific and relevance weighted attributes includehouse style. Certain attributes may have a plurality of values, such asthe type of siding attribute and the overall style attribute.

[0043] Each attribute value under the relevance weighted attributes hasan associated relevance weighting. The relevance weighting indicates thedegree to which the attribute applies to the item in the record. Forexample, the house style attribute under the category-specific andweighted attribute type, has four values, Shingle-Style, Colonial, Cape,Modernist. The relevance weight of the value Shingle-Style is “4”meaning that this value is highly relevant to the description of thehouse. The relevance weight of the attribute value Colonial is “2”meaning that the descriptor applies but is only somewhat relevant to thedescription of the house. The relevance weight of the attribute valueCape is “1” meaning that this descriptor is only slightly relevant tothe house. The relevance weight of the attribute value Modernist is “0”meaning that this descriptor has no relevance to this house.

[0044] The relevance-weighted attribute architecture makes it possiblefor users to search among items in ways that are more precise and usefulthan is possible with databases that simply treat attributes as true orfalse. Using the relevance weights, attribute values are applied to adatabase item to varying degrees in order to enrich the description ofthe item and thereby make that item more searchable in the database.

[0045] Searching

[0046] To find a product in the system, the user first selects acategory. FIG. 4 is a screen shot of an introductory screen showing alist of categories according to principles of the invention. Thecategories list includes categories such as appliances and bathroomfixtures and fittings. A further list under categories is the “Rooms”list including such categories as bathroom and home office. The userselects a category of interest.

[0047]FIG. 5 is a screen shot of the category of bathroom fixtures andfittings selected from the list of FIG. 4. The user browses a selectedcategory until a product of interest is found. The list under bathroomfixtures includes subcategories such as bath accessories and tubs.

[0048]FIG. 6 is a screen shot of the tubs subcategory selected from thelist of FIG. 5. This figure shows groups of attributes that apply to thedata items under the tubs subcategory. The attributes are presented tothe user in predetermined groups based on assumptions of what usersmight be interested in. The user selects an attribute, in this case, anattribute under groupings of style, price, or brand.

[0049]FIG. 7 is a list of the tubs subcategory having the attribute“Kohler” from the list of FIG. 6, the brand Kohler having been selectedby the user. The results of the attribute choice are the list ofavailable tubs from the manufacturer Kohler. The user may now choose aspecific item from the database.

[0050] In a simple search, such as the above search on a singleattribute, the system presents all items in the database having thatattribute, but those having the highest relevance weights are listedfirst and then the rest of the items are listed in descending order byrelevance weight. For example, if a user is searching for items that areof the style “Arts & Crafts,” those items which are the most “Arts &Crafts” are displayed first on the list of items. The attribute value“Kohler” chosen above, has no relevance weight and so the resulting listis merely the list of all tub items in the database having that brand.

[0051]FIG. 8 is a screen shot of a tub item selected from the list ofFIG. 7, the Kohler Birthday Bath. A photograph of the item isaccompanied by a description. In the database, the elements of thedescription are stored as attributes and values as illustrated above inFIGS. 2 and 3. The values of relevance weighted attributes are assignedrelevance weights at the time the item is entered into the database.Also, shown in FIG. 8 are the options of saving this item to the user'spersonal portfolio and the “Find Similar” search which will be describedbelow. To keep the user interface simple, many of the attributes are notnormally displayed and are used only for “Find Similar” calculations andother advanced searching. Relevance weights, must match check boxes, andimportance weights (described below) may be used to calculate whichattributes may be exposed on the user interface. For example, the threemost noteworthy attributes of a product or a ranked list of the mostnoteworthy attributes of a particular product may be exposed on the userinterface.

[0052] The user may perform a “Find Similar” search in order to findother products having qualities like the product the user has alreadyfound. FIG. 9 shows the results of a “Find Similar” search on the Kohlertub of FIG. 8. The results are a selection of tubs having similarappearance and other characteristics.

[0053] To accomplish more precise “Find Similar” results, the system hasmust match attributes as shown in FIG. 10. FIG. 10 uses the houseexample of FIG. 3. “Must match” attributes define which attribute musthave values that match exactly in order to be shown in “Find Similar”search results. “Find Similar” results are based on overall similarity,rather than just on similarity to a particular attribute or to acombination of attributes.

[0054] A relative “importance” weighting is assigned to each attribute.These importance weightings are typically assigned using a numericalscale (such as, on a scale of blank, where “blank” means a zeroweighting, and 1 to 5, where 1 means normal or no extra importance and 5is 5 times as important as a 1), but could be applied using a variety ofmetrics.

[0055] The importance weightings are fundamentally different from therelevance weightings used elsewhere in the attribute tagging andmatching system. Relevance weightings say how relevant a particularattribute value is for a particular database entry. Importanceweightings establish how important that particular attribute should befor calculations of similarity.

[0056] The importance weightings are used to tune search results. Forexample, the dresser style attribute value “Chippendale” might be givena weighting of 5 and the leg style attribute value “claw-and-ball” mightbe given a normal weighting of 1. In this case the importance ofattribute differences would be five times as great for dresser style asit would be for leg style.

[0057] “Must match” check boxes and importance weightings are firstestablished as defaults for each database category. Then, for eachproduct, these default “must match” checkboxes and importance weightingsmay be overridden by the user, if desired, in order to tune the resultsof a “Find Similar” search.

[0058] For example, normally the material a faucet is made of is not asimportant as its finish. If the material, however, is 24-carat gold, itsimportance to the order of search results is likely to be dramaticallygreater and might merit a weighting of as much as 5.

[0059] Use of must match check boxes and importance weightings enablesthe system to accommodate important distinctions both at the level ofgeneral business rules and at the level of individual products.

[0060] The “Find Similar” search uses similarity of database items toproduce search results using the method below. Values and weights arereferenced in the database as follows: item[i].attribute[j].value[k] foran attribute value without a relevance weight;

[0061] item[i].attribute[j].value[k].relevance for an attribute valuewith a relevance weight;

[0062] item[i].attribute[j].mustmatch for a must match attribute; and

[0063] item[i].attribute[j].importance for an attribute having animportance value.

[0064] In a similarity search, must match serves as a “go/no go” gate,in which the attribute values must match.

[0065] A similarity metric is calculated for all remaining attributesthat have non-zero importance weightings.

[0066] Attributes with importance weightings of blank (or zero) areignored.

[0067] The similarity algorithm for unweighted attributes is as follows:

[0068] Let S_(mn) represent the metric of similarity between item[m] anditem[n] where$S_{mn} = {\sum\limits_{j}{{Attr}\quad {Im}\quad {p_{mj} \cdot {AttrValSim}_{mnj}}}}$

[0069] where AttrImp_(mj) is the scaled measure of importance of the jthattribute of item m and

[0070] AttrImp_(mj)=(item[m] .attribute[j] .importance)^(x) for any x>0

[0071] and where AttrValSim_(mnj) is the measure of similarities ofvalues of a given attribute and

[0072] AttrValSim_(mnj)=${AttrValSim}_{mnj} = {\sum\limits_{k}{\sum\limits_{l}{S_{{attr}_{j}}\left( {{value}_{mjk},{value}_{njl}} \right)}}}$

[0073] where value_(mjk)=item[m].attribute[j].value[k],

[0074] where value_(njl)=item[n].attribute[j].value[l],

[0075] and where

[0076] S_(attr[j]) is any metric of similarity between all possiblevalues of attribute j, e.g. S_(attr[j]) (a,b) equals 1 if a=b, 0otherwise.

[0077] For weighted attributes, the similarity algorithm is as follows:

S _(attr) _(j) (a,b)=1−|a.weight−b.weight|

[0078] The “find similar” process operates as follows. FIG. 11 has tableof attributes and values of a source item and a table of attributes andvalues of a target time to be used to illustrate the find similarprocess. Chair #1 is a source data item and Chair #2 is a target dataitem. Each table has a must-match column, an importance column, anattribute column, a values column, and a relevance column. Importanceand relevance weights are presented as an integer value over ten.Relevance weights in this example apply only to attribute “style.” Chair#1 has one must-match attribute which the category.

[0079]FIG. 12 is a flow chart of the find similar process. The systemtakes as input a source object having at least one non-trivial ornon-null value for an attribute and searches for similar data itemsusing the attributes, values, and weights of the source object. If thesource object has any must-match attributes, the system searches thedatabase for matches of those attributes first, block 500. If no matchesare found, the search ends, block 505.

[0080] If matches for the must-match attributes are found, the systemcalculates similarity measures, block 510. In the present embodiment ofthe invention, similarity is calculated as follows using the unweightedsimilarity algorithm for the unweighted values and the weightedsimilarity algorithm for the weighted values: Attributes Similarity no.Category chair matches 1 Price does not match 0 Color black matches 1Color blue does not match 0 Style Victorian matches to a degree

[0081] ${1 - {{\frac{1}{10} - \frac{5}{10}}}} = {6/10}$

[0082] Style Modern is not present in target${1 - {{\frac{8}{10} - \frac{0}{10}}}} = {2/10}$

[0083] Style Traditional is not present in target${1 - {{\frac{3}{10} - \frac{0}{10}}}} = {7/10}$

[0084] Style French is not present in source${1 - {{\frac{10}{10} - \frac{9}{10}}}} = {1/10}$

[0085] Similarity values are calculated for all attributes even wherethe values are blank or zero. The results may need to be normalizedunder some circumstances.

[0086] The similarity values for attributes are multiplied by theimportance weights. Attribute Similarity × Importance Category chair  (1) (5/10) = 0.50 Price   (0) (5/10) = 0.00 Color black   (1) (5/10) =0.50 Color blue   (0) (5/10) = 0.00 Style Victorian (6/10) (8/10) = 0.48Style Modern (2/10) (8/10) = 0.16 Style Traditional (7/10) (8/10) = 0.56Style French (1/10) (8/10) = 0.08 Total =        2.28

[0087] The results of the weighted similarities are added to yield theoverall similarity value, block 515. In the present embodiment of theinvention the similarities are generally converted to percentages. Thisis done by finding the similarity value an object has to itself andusing that value as a divisor. Using the above-described method,similarities are calculated for each item found in the must-matchsearch. The list of items is the sorted by degree of similarity, block520.

[0088] The list of similar items is then displayed to the user, block525. If there are many similar items, only a predetermined number ofmatches are displayed to the user, for example, the ten most similarchairs.

[0089] An alternative method of determining similarity is firstcalculating the Euclidean distance between source and target points foreach attribute. For each target and potential source item having nattributes, all n attributes and values are mapped into n-space. Thesimilarity is the inverse distance between the mapped points. 70 Theuser portfolio provides further enhancement to the searchingcapabilities of the database. FIG. 13 is a diagram of objects saved to aMy Portfolio file mapped to an object in the database that is not in theMy Portfolio file. The user has qualified each object in the MyPortfolio file with the additional attribute of user preference and thevalues of “Love”, “Like”, or “Hate”. These values are given numericalvalues of 2, 1 and −1 respectively. When a “Find Similar” search isperformed on objects similar to those in the My Portfolio file, thesimilarity values are further refined using the user preferenceattribute values. For example, if object P is being examined forsimilarity, the similarities between P and the objects in the MyPortfolio folder are multiplied by the user preference value. Theresults are added together to give the similarity value of P accordingto the preferences of the particular user. In this way, the My Portfoliofolder is used to refine the search of the database.

[0090] It is to be understood that the above-described embodiments aresimply illustrative of the principles of the invention. Various andother modifications and changes may be made by those skilled in the artwhich will embody the principles of the invention and fall within thespirit and scope thereof.

What is claimed is:
 1. A database management system, comprising: astorage system wherein a plurality of data items, each having aplurality of associated attributes, each is linked logically to storedvalues for each of said associated attributes and to a weight for atleast one of said associated attributes; and a search system that, givena first data item having at least one non-trivial value for one of saidplurality of associated attributes, identifies among said plurality ofdata items in said storage system, a second data item having attributes,values and weights similar to those of said first data item.
 2. Adatabase management system, comprising: a storage system having a datastructure to store data items having a plurality of associatedattributes, each of said plurality of associated attributes having avalue, at least one selected attribute of said plurality having aweight; and a search system to search the database for a target dataitem having similar attributes, values and weights as a source dataitem.
 3. The database management system of claim 2 wherein saidattributes further comprise a general attribute type and a specificattribute type.
 4. The database management system of claim 2 whereinsaid attributes further comprise general and unweighted attributes,category-specific and unweighted attributes, general and weightedattributes, and category-specific and weighted attributes.
 5. Thedatabase management system of claim 4 wherein said attributes furthercomprise at least one must-match attribute.
 6. The database managementsystem of claim 2 wherein at least one attribute has a plurality ofvalues.
 7. A method of managing a database, comprising the steps of:storing data items in the database; associating a plurality ofattributes with each data item, each attribute having at least onevalue; providing a weight for at least one selected attribute; receivinga search request having a plurality of search attributes, each saidsearch attribute having at least one value and at least one attributehaving a weight; and searching the database in response to said requestusing said plurality of search attributes to find data items matchingsaid search attributes.
 8. The method of claim 7 wherein said searchingstep further comprises searching the database in response to saidrequest using said plurality of search attributes to find data itemshaving attributes similar to said search attributes.
 9. The method ofclaim 8 further comprising the steps of: receiving at least onemust-match attribute in said search request; and searching the databasein response to said request using said plurality of search attributes tofind data items matching said at least one must-match attribute; and ifdata items matching said at least one must-match attribute are found,searching the database using the remaining search attributes to findsimilar data items; and if no data items matching said at least onemust-match attribute are found, ending the search.
 10. The method ofclaim 8 further comprising the steps of: receiving at least oneimportance weight associated with an attribute in said search request;and searching the database in response to said request using saidplurality of search attributes and said at least one importance weightto find data items having similar attributes to said search attributes.